#!/usr/bin/env python
# coding: utf-8

# ## The Impact of COVID-19 on Technical Services Units
# 
# These survey results are compiled from a survey conducted from October 18, 2021 to January 1, 2022 regarding the measurable impacts of COVID-19 on technical services units within the United States. The survey was widely distributed via Oklahoma library listservs, as well as through discussion groups in ALA Connect and technical services or cataloging-specific listservs or groups. 
# 
# Data was downloaded from Qualtrics (the survey gathering tool) into an Exel file. Due to a techncial malfunction, IP addresses were collected by Qualtrics, despite setting up the survey to exclude this data. As such, the data is treated as sensitive information. This initial file was saved as the raw data. The raw version of this file containing possibly identifiable information is stored in an encrypted file. Note that the encryption key for this file is stored offline. The duplicated and anonymized version of this data is utilized as the "raw data file" and is available in a raw, CSV format within the Harvard Dataverse here: https://doi.org/10.7910/DVN/ASTFMH. 
# 
# The origin clean version of data removed fields auto-populated by Qualtrics, including possibly identifiable infomation columns (StartDate, EndDate, Status, IPAddress, Progress, Duration (in seconds), Finished, RecordedDate, ResponseId, RecipientLastName, RecipientFirstName, RecipientEmail, ExternalReference, LocationLatitude, LocationLongitude, DistributionChannel, UserLanguage). The cleaned version of this data is available within the Harvard Dataverse here: https://doi.org/10.7910/DVN/BMHGPY. 
# 
# Data was then standardized in a last pass to remove completely-blank rows were removed within Excel. The raw and origin clean files contain 832 total rows of data; 72 completely-blank rows were removed, resulting in 760 remaining data rows. Two additional rows of data containing the survey questions and ImportId were also removed to streamline the data analysis process, resulting in a final tally of 758 rows of data. Qualtric's unordered question ennumeration (Q10, Q40, Q33, et cetera) was replaced with a new numberical order (Q1, Q2, Q3, et cetera) to help with data analysis shorthand in Python. 
# 
# Three versions of the clean file were saved into CSV formats: Impact_of_COVID_on_Tech_Services_Clean_1.CSV, the version of the file containing questions and Q-numbers, Impact_of_COVID_on_Tech_Services_Clean_2.CSV, the version of the file lacking the actual survey questions, thus relying on Q-numbers to correlate to the first of the two clean files, and Impact_of_COVID_on_Tech_Services_Clean_3.CSV, which removed all responses that did not answer past Q4 (the first section of the survey). Four Q36 (What state is your library located in?) rows were also removed for being outside the scope of the study, as respondees were located in Australia or Canada. Impact_of_COVID_on_Tech_Services_Clean_3.CSV also standardizes all state names. 
# 
# All three files were saved separately to demonstrate the major file changes that took place during data standardization with Impact_of_COVID_on_Tech_Services_Clean_3.CSV ultimately used for the bulk of subsequent data analyses. All three files are availalbe within the Harvard Dataverse here: https://doi.org/10.7910/DVN/DGBUV7.  
# 
# This Python script seeks to: fill in NaN responses where data is missing, to analyze the data holistically for trends (many libraries had a measureable COVID-19 impact on their technical services units/procedures or did not, et cetera). 
# 
# The analyzed dataset was later clustered by library type to identify type-trends, staffing numbers pre- and post-COVID to standardize numbering outside of the cleaned datasets, and by qualitative responses for Q12, Q20, and Q34 so reoccurring words could be counted.
# 
# Links to related Python script for this study:
# COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data): https://doi.org/10.7910/DVN/SXMSDZ
# COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type): https://doi.org/10.7910/DVN/SXMSDZ
# 
# The researcher specifically aims to invesigate whether COVID-19 had an immediate or lasting impact on library technical services units or procedures and aims to keep this investigation in mind when analyzing data. 

# In[1]:


#Setup Numpy and Pandas
import pandas as pd
import numpy as np

#Import Matplotlib
import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')


# In[2]:


#Import File
cov=pd.read_csv('Impact_of_COVID_on_Tech_Services_Clean_3.csv')

print(cov)


# In[52]:


#Data Analysis
print(cov.head()) #Column names, data samples
print(cov.tail()) #Tail sample, 662 rows
cov.describe() #5 rows with 38 columns
type(cov) #Confirming is dataframe type

#cov.dtypes is Object for all 38 columns


# In[53]:


#Question 1 Value Counts
#Did your technical services unit or any members within your unit work remotely during the COVID-19 pandemic?
Q1=cov['Q1'].value_counts()
print(Q1)


# In[54]:


#Question 2 Value Counts
#Is your technical services unit or any members within your unit still working remotely because of the COVID-19 pandemic?
Q2=cov['Q2'].value_counts()
print(Q2)


# In[55]:


#Question 3 Value Counts
#If your department worked remotely during the COVID-19 pandemic, how long did they work remotely for?
Q3=cov['Q3'].value_counts()
print(Q3)


# In[56]:


#Question 4 Value Counts
#Has your technical services unit or any members within your unit migrated to part-time or full-time remote work post-COVID-19?
Q4=cov['Q4'].value_counts()
print(Q4)


# In[57]:


#Question 5 Value Counts
#Which areas is your technical services team comprised of? If you do not work in a technical services team or unit (e.g. a solo librarian), which technical tasks do you oversee?
#Please select all that apply.

#Convert column type to string
cov['Q5'] = cov['Q5'].astype('str') 

#Tallied areas of technical services teams
cov['Q5'].str.split(',', expand=True).stack().value_counts()


# In[58]:


#Question 6 and 7 Counts
#Data in Questions 6 and 7 pertaining to staffing levels pre- and post-COVID were extracted into a new CSV file (titled Impact_of_COVID_on_Tech_Services_Staffing_Clean.csv), to standardize numbering for mathematical data analysis. 
staffing=pd.read_csv('Impact_of_COVID_on_Tech_Services_Staffing_Clean_3.csv')
print(staffing)


# In[59]:


#Question 6
#"How many full and part-time staff members (not student workers) were on your Technical Services team prior to the COVID-19 pandemic?
#If you are a solo librarian, the only individual who oversees technical tasks in your library, et cetera, please indicate ""1""."
Q6=staffing['Q6']
print(Q6)

#Average
staffing2=staffing['Q6'].mean()
print(staffing2)

#Sum
staffing3=staffing['Q6'].sum()
print(staffing3)


# In[60]:


#Question 7
#How many full and part-time staff members (not student workers) are on your technical services team now?
#If you are a solo librarian, the only individual who oversees technical tasks in your library, et cetera, please indicate "1".
Q7=staffing['Q7']
print(Q7)

#Average
staffing4=staffing['Q7'].mean()
print(staffing4)

#Sum
staffing5=staffing['Q7'].sum()
print(staffing5)


# In[61]:


#Question 8 Value Counts
#"Did your technical services unit experience furloughs, lay-offs, voluntary retirements, or similar as a direct result of the COVID-19 pandemic?
#Please select all that apply."

#Convert column type to string
cov['Q8'] = cov['Q8'].astype('str') 

#Tallied areas of technical services teams
cov['Q8'].str.split(',', expand=True).stack().value_counts()


# In[62]:


#Question 9 Value Counts
#If applicable, has your technical services unit recovered any positions lost to furloughs, lay-offs, voluntary retirement, or similar specifically related to COVID-19? This may include refilling positions vacated because of the COVID-19 pandemic.

Q9=cov['Q9'].value_counts()
print(Q9)


# In[63]:


#Question 10 Value Counts
#Did your technical services unit lose a significant amount of institutional knowledge or skill as a direct result of COVID-19 staffing changes?

Q10=cov['Q10'].value_counts()
print(Q10)


# In[64]:


#Question 11 Value Counts
#Has your technical service unit or library struggled with workload, backlog, maintaining quality of service, or similar issues as a direct result of the impacts of the COVID-19 pandemic?

Q11=cov['Q11'].value_counts()
print(Q11)


# In[65]:


#Question 12 Value Counts
#If applicable, what has been the impact of furloughs, lay-offs, or voluntary retirement on your technical services unit as a result of the COVID-19 pandemic?

#Open-Ended Answer


# In[66]:


#Question 13 Value Counts
#Did your library lose funding or a portion of funding that impacted work for your technical services unit?

Q13=cov['Q13'].value_counts()
print(Q13)


# In[67]:


#Question 14 Value Counts
#"If your library lost funding or a portion of funding that impacted work for your technical services unit, what did that funding go towards? 
#Please select all that apply."

#Convert column type to string
cov['Q14'] = cov['Q14'].astype('str') 

#Tallied areas of technical services teams
cov['Q14'].str.split(',', expand=True).stack().value_counts()


# In[68]:


#Question 15 Value Counts
#If you selected "Other", please explain:

#Open-Ended Question


# In[69]:


#Question 16 Value Counts
#"Did your technical services unit encounter other disruptions to normal staffing or salaries because of the COVID-19 pandemic? 
#Please select all that apply."

#Convert column type to string
cov['Q16'] = cov['Q16'].astype('str') 

#Tallied areas of technical services teams
cov['Q16'].str.split(',', expand=True).stack().value_counts()


# In[70]:


#Question 17 Value Counts
#Did your library or technical services unit outsource portions or all of its technical tasks to a vendor during the COVID-19 pandemic?

Q17=cov['Q17'].value_counts()
print(Q17)


# In[71]:


#Question 18 Value Counts
#If your library or technical services unit outsourced portions or all of its technical tasks to a vendor during the COVID-19 pandemic, was it as a direct result of the COVID-19 pandemic?

Q18=cov['Q18'].value_counts()
print(Q18)


# In[72]:


#Question 19 Value Counts
#Did your library restrict or revise their services during the COVID-19 pandemic?

Q19=cov['Q19'].value_counts()
print(Q19)


# In[73]:


#Question 20 Value Counts
#How did these restrictions or revisions impact technical services work?

#Open-Ended Question


# In[74]:


#Question 21 Value Counts
#Did members of your technical services unit work on special remote projects or clean-up projects during the COVID-19 pandemic?

Q21=cov['Q21'].value_counts()
print(Q21)


# In[75]:


#Question 22 Value Counts
#"If so, what types of projects did your technical services unit work on during the COVID-19 pandemic? 
#Please select all that apply."

#Convert column type to string
cov['Q22'] = cov['Q22'].astype('str') 

#Tallied areas of technical services teams
cov['Q22'].str.split(',', expand=True).stack().value_counts()


# In[76]:


#Question 23 Value Counts
#If you selected "Other", please explain:

#Open-Ended Question


# In[77]:


#Question 24 Value Counts
#Has your technical services unit returned to their pre-COVID-19 ("normal") services yet?

Q24=cov['Q24'].value_counts()
print(Q24)


# In[78]:


#Question 25 Value Counts
#If you selected "Other", please explain:

#Open-Ended Question


# In[79]:


#Question 26 Value Counts
#Has your technical services unit decided to keep a project, process, or workflow implemented during the COVID-19 pandemic as a permanent addition to prior services?

Q26=cov['Q26'].value_counts()
print(Q26)


# In[80]:


#Question 27 Value Counts
#If you selected "Other", please explain:

#Open-Ended Question


# In[81]:


#Question 28 Value Counts
#"Would your technical services unit have adopted these changes regardless of the COVID-19 pandemic? 
#For example, planning a migration that would have proceeded regardless of working from home."

Q28=cov['Q28'].value_counts()
print(Q28)


# In[82]:


#Question 29 Value Counts
#"If your technical services unit decided to keep a project, process, or workflow implemented during the COVID-19 pandemic as a permanent addition to prior services, what type of change has been adopted? 
#Please select all that apply."

#Convert column type to string
cov['Q29'] = cov['Q29'].astype('str') 

#Tallied areas of technical services teams
cov['Q29'].str.split(',', expand=True).stack().value_counts()


# In[83]:


#Question 30 Value Counts
#If you selected "Other", please explain:

#Open-Ended Question


# In[84]:


#Question 31 Value Counts
#In your personal opinion, have these changes been successful?

Q31=cov['Q31'].value_counts()
print(Q31)


# In[85]:


#Question 32 Value Counts
#In your personal opinion, will these changes have a lasting impact on your technical services unit and the services provided by that unit?

Q32=cov['Q32'].value_counts()
print(Q32)


# In[86]:


#Question 33 Value Counts
#In your personal opinion, how do you view these changes?

Q33=cov['Q33'].value_counts()
print(Q33)


# In[87]:


#Question 34 Value Counts
#"Do you have any concerns about your position, department/unit, or services offered because of these changes?
#Please skip this question if it is not applicable."

#Open-Ended Question


# In[88]:


#Question 35 Value Counts
#Are you a solo librarian, school librarian, and/or librarian working in a small or rural library who may wear many hats, including handling some or all of the technical services tasks for your library?

Q35=cov['Q35'].value_counts()
print(Q35)


# In[4]:


#Question 36 Value Counts
#What state is your library located in?

Q36=cov['Q36'].value_counts()
print(Q36)

#Export to CSV for Visualization
Q36.to_csv('Q36.csv', sep=',')


# In[90]:


#Question 37 Value Counts
#What type of library do you work in?
#Library Types
Q37=cov['Q37'].value_counts()
print(Q37)


# In[91]:


#Question 38 Value Counts
#What is your role at that library?

#Convert column type to string
cov['Q38'] = cov['Q38'].astype('str') 

#Tallied areas of technical services teams
cov['Q38'].str.split(',', expand=True).stack().value_counts()


# In[43]:


#Data Visualizations
#Grouped Bar Graphs For Before/Afters, Library Comparison Types
#Pie Chart for Total Unit Comparisons
#Stacked Bar Charts for Likert Questions


# In[45]:


#Areas Comprising Technical Services Units
numbers=[616, 550, 459, 383, 301, 258, 202, 160, 76, 66]
areas=['Cataloging', 'Processing', 'Acquisitions', 'Electronic Resources',
       'Collection Development', 'Library Systems', 'Interlibrary Loan', 
       'Archives/Special Collections', 'Federal Depository', 'Other']

df=pd.DataFrame({'Numbers':numbers, 'Areas': areas})

df.plot.bar(x='Areas', y='Numbers', fontsize=8, rot=90, legend=False, 
            title='Areas Comprising Technical Services Units')

plt.show()

plt.savefig('TS_Units.png')


# In[46]:


#Green Figures
groups=['Pre-COVID','Post-COVID']
numbers=[4040.1, 3674.3]

groups_2=['Pre-COVID','Post-COVID']
numbers_2=[6.4, 5.8]

fig, axes=plt.subplots (nrows=1, ncols=2, figsize=(12,6))

axes[0].bar(groups, numbers,color='darkgreen')
axes[0].set_xlabel('Pre- and Post-COVID')
axes[0].set_ylabel('Total Survey Sum')
axes[0].set_title('Sum of All Positions')

axes[1].bar(groups_2, numbers_2,color='darkgreen')
axes[1].set_xlabel('Pre- and Post-COVID')
axes[1].set_ylabel('Total Survey Average')
axes[1].set_title('Average of All Positions')

fig.suptitle('Technical Services Positions Pre- and Post-COVID')


#Blue Figures
groups=['Pre-COVID','Post-COVID']
numbers=[4040.1, 3674.3]

groups_2=['Pre-COVID','Post-COVID']
numbers_2=[6.4, 5.8]

fig, axes=plt.subplots (nrows=1, ncols=2, figsize=(12,6))

axes[0].bar(groups, numbers)
axes[0].set_xlabel('Pre- and Post-COVID')
axes[0].set_ylabel('Total Survey Sum')
axes[0].set_title('Sum of All Positions')

axes[1].bar(groups_2, numbers_2)
axes[1].set_xlabel('Pre- and Post-COVID')
axes[1].set_ylabel('Total Survey Average')
axes[1].set_title('Average of All Positions')

fig.suptitle('Technical Services Positions Pre- and Post-COVID')

plt.savefig('TS_Postions_Pre_Post.png')


# In[47]:


#Green Figures
groups=['Pre-COVID','Post-COVID']
numbers=[4040.1, 3674.3]

groups_2=['Pre-COVID','Post-COVID']
numbers_2=[6.4, 5.8]

fig=plt.figure()
axes=fig.add_axes([0,0,1,1])
axes.bar(groups, numbers,color='darkgreen')
# axes.plot(color='darkgreen')
axes.set_xlabel('Pre- and Post-COVID')
axes.set_ylabel('Total Survey Sum')
axes.set_title('Pre- and Post-COVID Sum of All Positions')

fig=plt.figure()
axes=fig.add_axes([0,0,1,1])
axes.bar(groups_2, numbers_2,color='darkgreen')
# axes.plot(color='darkgreen')
axes.set_xlabel('Pre- and Post-COVID')
axes.set_ylabel('Averages for All Respondents')
axes.set_title('Average Positions Pre- and Post-COVID')

#Blue Figures
groups=['Pre-COVID','Post-COVID']
numbers=[4040.1, 3674.3]

groups_2=['Pre-COVID','Post-COVID']
numbers_2=[6.4, 5.8]

fig=plt.figure()
axes=fig.add_axes([0,0,1,1])
axes.bar(groups, numbers)
axes.set_xlabel('Pre- and Post-COVID')
axes.set_ylabel('Total Survey Sum')
axes.set_title('Pre- and Post-COVID Sum of All Positions')

fig=plt.figure()
axes=fig.add_axes([0,0,1,1])
axes.bar(groups_2, numbers_2)
axes.set_xlabel('Pre- and Post-COVID')
axes.set_ylabel('Averages for All Respondents')
axes.set_title('Average Positions Pre- and Post-COVID')


# In[92]:


#By Total
labels=['Academic', 'Public', 'K-12', 'Special Collections/Archives', 'Other', 'Blanks']
before_covid=[1939.75, 1626.25, 86, 136, 225.6, 145.5]
after_covid=[1685.75, 1516.2, 66, 131, 202.8, 127]

x=np.arange(len(labels))
width=0.35

fig, ax=plt.subplots(figsize=(12, 8))
rects1=ax.bar(x - width/2, before_covid, width, label='Pre-COVID')
rects2=ax.bar(x + width/2, after_covid, width, label='Post-COVID')

ax.set_ylabel('Library Type')
ax.set_xlabel('Library Staffing Pre- and Post-COVID')
ax.set_title('Library Staffing Numbers Pre- and Post-COVID by Library Type', pad=15)
ax.legend()

ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
ax.set_xticklabels([' ', 'Academic', 'Public', 'K-12', 'Special Collections/Archives', 'Other', 'Blanks'], 
                   rotation=45, ha='right')
ax.legend()

fig.tight_layout()

plt.show()

plt.savefig('Staffing_Numbers_PrePost_ByType.png')


# In[93]:


#By Average
labels=['Academic', 'Public', 'K-12', 'Special Collections/Archives', 'Other', 'Blanks']
before_covid=[7.5, 7.0, 2.0, 4.1, 5.0, 4.0]
after_covid=[6.6, 6.6, 1.5, 4.0, 4.4, 3.4]

x=np.arange(len(labels))
width=0.35

fig, ax=plt.subplots(figsize=(12, 8))
rects1=ax.bar(x - width/2, before_covid, width, label='Pre-COVID')
rects2=ax.bar(x + width/2, after_covid, width, label='Post-COVID')

ax.set_ylabel('Library Type')
ax.set_xlabel('Library Staffing Pre- and Post-COVID')
ax.set_title('Average Library Staffing Numbers Pre- and Post-COVID by Library Type', pad=15)
ax.legend()

ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
ax.set_xticklabels([' ', 'Academic', 'Public', 'K-12', 'Special Collections/Archives', 'Other', 'Blanks'], 
                   rotation=45, ha='right')
ax.legend()

fig.tight_layout()

plt.show()

plt.savefig('Average_Staffing_Numbers_PrePost_ByType.png')


# In[102]:


# Most occurring words in Q12 column. 
cov.Q12.str.split(expand=True).stack().value_counts()[:50]


# In[103]:


# Most occurring words in Q20 column. 
cov.Q20.str.split(expand=True).stack().value_counts()[:50]


# In[104]:


# Most occurring words in Q34 column. 
cov.Q34.str.split(expand=True).stack().value_counts()[:50]


# In[24]:


# Data for Demographics Visualizations

# Solo Librarian
numbers=[154, 425]
responses=['Yes', 'No']

# State/Location
numbers2=[261, 238, 45, 34, 47, 46]
responses2=['Academic Library', 'Public Library', 'K-12 School Library', 'Special Collections/Archive',
           'Other', 'No Response']

# Role
numbers3=[243, 203, 88, 54, 26, 48]
responses3=['Librarian', 'Manager/Supervisor', 'Assistant/Paraprofessional', 'Dean/Director', 'Other', 'No Response']


# In[ ]:


# State/Location
# Opted not to include State/Location visualization to avoid too busy of a visualization. 

# Utilized exported Q36 file to alphabetize results before copying/pasting into list
numbers4=[3, 3, 3, 5, 25, 8, 6, 2, 8, 7, 8, 1, 2, 1, 89, 21, 8, 3, 5, 4, 1, 9, 
          16, 8, 8, 1, 1, 3, 2, 8, 3, 23, 11, 2, 24, 136, 7, 20, 1, 3, 2, 1, 4,
          26, 3, 12, 13, 3]
responses4=['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
            'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 
            'Government library in all states', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 
            'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
            'Michigan', 'Minnesota', 'Missouri', 'Nebraska', 'Nevada', 'New Hampshire', 
            'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 
            'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
            'South Dakota', 'Southeast', 'Tennessee', 'Texas', 'Utah', 'Virginia', 'Washington',
            'Wisconsin']

axes[1].bar(Q36)
axes[1].set_xlabel('State Served')
axes[1].set_ylabel('Number of Responses')
axes[1].set_title('State or Location of Repondent')


# In[29]:


# Visualizations for Demographics
fig, axes=plt.subplots (nrows=1, ncols=3, figsize=(12,6))

# Solo Librarian
axes[0].bar(responses, numbers)
axes[0].set_xlabel(' ')
axes[0].set_ylabel('Number of Responses')
axes[0].set_title('Are You a Solo Librarian?')

# Library Type
axes[1].bar(responses2, numbers2)
axes[1].set_xlabel(' ')
axes[1].set_ylabel('Number of Responses')
axes[1].set_title('Repository Type')

# Role
axes[2].bar(responses3, numbers3)
axes[2].set_xlabel(' ')
axes[2].set_ylabel('Number of Responses')
axes[2].set_title('Repondent Role at Library')

for ax in fig.axes:
    ax.tick_params(labelrotation=90)
    
fig.suptitle('Survey Respondent Demographics')

